19830: Don't allocate to compare string slices #185

martin-augment · 2026-01-15T12:53:45Z

19830: To review by AI

Replace usages of str::to_lowercase().as_str() + patterm matching with str::eq_ignore_ascii_case(&str) to avoid allocating temporary String.

coderabbitai · 2026-01-15T12:54:16Z

Walkthrough

This pull request refactors string comparison logic across multiple files to use case-insensitive matching via eq_ignore_ascii_case instead of to_lowercase-based comparisons. Changes occur in file type detection, object store mode parsing, print options handling, and date part extraction. The modifications replace match expressions and equality checks that relied on lowercasing strings with direct case-insensitive comparisons. All changes preserve existing behavior and control flow patterns while altering implementation approaches.

✨ Finishing touches

📝 Generate docstrings

🧹 Recent nitpick comments

datafusion/functions/src/datetime/date_part.rs (1)
250-253: Inconsistent: is_epoch() still allocates via to_lowercase().

This function still uses to_lowercase() which allocates, inconsistent with the PR's goal. Consider updating it to use eq_ignore_ascii_case() as well.
♻️ Suggested fix
 fn is_epoch(part: &str) -> bool {
     let part = part_normalization(part);
-    matches!(part.to_lowercase().as_str(), "epoch")
+    part.eq_ignore_ascii_case("epoch")
 }

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9cedd2b and 5ec88b5.

📒 Files selected for processing (4)

datafusion-cli/src/exec.rs
datafusion-cli/src/object_storage/instrumented.rs
datafusion-cli/src/print_options.rs
datafusion/functions/src/datetime/date_part.rs

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Cursor Bugbot
GitHub Check: Analyze (rust)

🔇 Additional comments (4)

datafusion-cli/src/object_storage/instrumented.rs (1)

122-130: LGTM!

The refactoring from match with to_lowercase() to eq_ignore_ascii_case() correctly avoids the temporary String allocation while preserving the same case-insensitive matching behavior. The existing tests at lines 843-853 already validate mixed-case inputs like "dIsABleD", "SUmMaRy", and "TRaCe".

datafusion-cli/src/exec.rs (1)

401-410: LGTM!

The case-insensitive file extension matching using eq_ignore_ascii_case() is correct and avoids unnecessary allocations. File extensions are ASCII-only, making this an appropriate choice.

datafusion-cli/src/print_options.rs (1)

51-54: LGTM!

Clean refactoring that avoids allocating a lowercase string for each comparison. The chained eq_ignore_ascii_case() calls correctly handle the three keywords ("inf", "infinite", "none") for unlimited rows.

datafusion/functions/src/datetime/date_part.rs (1)

217-231: LGTM!

The refactoring correctly replaces the match on a lowercased string with case-insensitive comparisons, avoiding allocation while preserving behavior for all date part variants.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-01-15T12:55:22Z

Code Review: PR #185 - Don't allocate to compare string slices

Summary

This PR optimizes string comparisons by replacing to_lowercase().as_str() pattern with eq_ignore_ascii_case(), eliminating unnecessary heap allocations. This is a solid performance improvement for hot path code.

✅ Strengths

Performance Improvement: The change eliminates unnecessary allocations when comparing strings case-insensitively. to_lowercase() creates a new String on the heap, while eq_ignore_ascii_case() performs comparison in-place.
Consistent Optimization Pattern: All four modified files follow the same optimization pattern, making the changes uniform and easy to review.
Functionally Equivalent: For ASCII-only comparisons (which these file extensions and SQL keywords are), eq_ignore_ascii_case() is semantically identical to to_lowercase().as_str().

🔍 Code Quality Observations

1. Code Readability Trade-off

The original match statement pattern was more compact and idiomatic Rust:

match ext.to_lowercase().as_str() {
    "csv" => Some(ConfigFileType::CSV),
    "json" => Some(ConfigFileType::JSON),
    ...
}

The new if-else chain is more verbose but avoids allocation:

if ext.eq_ignore_ascii_case("csv") {
    Some(ConfigFileType::CSV)
} else if ext.eq_ignore_ascii_case("json") {
    ...
}

Recommendation: This trade-off is acceptable for performance-sensitive code. Consider adding a brief comment in hot-path functions explaining the optimization.

2. Pattern Consistency Across Codebase

I found 88 occurrences of to_lowercase() across 39 files in the codebase. Notable remaining instances:

datafusion/functions/src/datetime/date_trunc.rs:88 - Similar pattern could be optimized
datafusion/common/src/format.rs:174, 223 - Multiple match statements
datafusion/common/src/config.rs:1772 - Config parsing

Recommendation: Consider filing a follow-up issue to audit and optimize other instances, particularly in hot paths.

🧪 Test Coverage

Existing Coverage: ✅ Good

The date_part function has extensive test coverage in datafusion/sqllogictest/test_files/datetime/date_part.slt
Tests include case variations: EXTRACT(quarter ...), EXTRACT("quarter" ...), EXTRACT('quarter' ...)
CLI file type parsing is tested through integration tests

Gap: The specific case-insensitive behavior is implicitly tested but not explicitly documented.

Recommendation: No additional tests needed - existing tests adequately cover the functionality. The optimization doesn't change behavior.

⚡ Performance Considerations

Positive Impact:

Eliminates heap allocations in string comparison operations
Reduces memory pressure and GC overhead (though Rust doesn't have GC, less allocation = less work for the allocator)
Cache-friendly: In-place comparison is more CPU cache-efficient

Magnitude:

For individual calls: Small but measurable improvement (saves one allocation per comparison)
For hot paths (like date_part in query processing): Can add up significantly in tight loops over large datasets

Benchmark Recommendation: For date_part.rs:217-230, consider adding a microbenchmark if this is in a hot path, though the improvement is straightforward enough that it's likely not necessary.

🔒 Security Considerations

No security concerns identified.

The eq_ignore_ascii_case() method is safe and has identical behavior to to_lowercase() for ASCII characters. All the strings being compared (file extensions, SQL keywords) are ASCII-only.

Note: If any future strings contain non-ASCII characters, eq_ignore_ascii_case() and to_lowercase() may behave differently. However, for the current use cases (file extensions: "csv", "json", "parquet"; SQL parts: "quarter", "epoch", etc.), this is not a concern.

🐛 Potential Issues

None identified. The changes are straightforward and correct.

📝 Minor Observations

Whitespace change in date_part.rs:57: Line 57 had trailing whitespace removed. ✅ Good cleanup.
Error message preservation: In instrumented.rs:130, the error message still correctly uses the original string {s}, not a lowercased version. ✅ Correct.
Return statement formatting: The semicolon placement in date_part.rs:232 is correct - it terminates the expression statement that creates arr.

✨ Recommendations

✅ Approve and Merge: This is a clean optimization with no functional changes.
Consider Follow-up: Create an issue to audit other to_lowercase().as_str() patterns in performance-sensitive areas:
- date_trunc.rs:88
- format.rs:174, 223
- Other datetime functions
Documentation: If this is part of a broader performance optimization effort (ref: issue Don't allocate to compare string slices apache/datafusion#19830), consider documenting this pattern in contributing guidelines to prevent regression.

Final Verdict: ✅ LGTM (Looks Good To Me)

This PR makes a solid performance improvement without sacrificing correctness. The code changes are clean, consistent, and well-scoped.

🤖 Generated with Claude Code

augmentcode · 2026-01-15T12:55:41Z

🤖 Augment PR Summary

Summary: Removes temporary string allocations when doing case-insensitive comparisons in several parsing paths.

Changes:

Replaced to_lowercase()-based matching with eq_ignore_ascii_case for config file extension detection in the CLI
Updated object store instrumentation mode parsing to avoid allocating while matching keywords
Adjusted max-rows parsing (inf/infinite/none) to use allocation-free comparisons
Updated date_part special-case handling to use allocation-free comparisons for supported parts

Technical Notes: Uses eq_ignore_ascii_case (ASCII-only case-insensitive comparison), which is appropriate for these ASCII keywords/extensions and avoids creating intermediate lowercase strings.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. No suggestions at this time.

Comment augment review to trigger a new review at any time.

Don't allocate to compare strings

5ec88b5

Replace usages of str::to_lowercase().as_str() + patterm matching with str::eq_ignore_ascii_case(&str) to avoid allocating temporary String.

augmentcode bot reviewed Jan 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

19830: Don't allocate to compare string slices #185

19830: Don't allocate to compare string slices #185

martin-augment commented Jan 15, 2026

Uh oh!

coderabbitai bot commented Jan 15, 2026 •

edited

Loading

Uh oh!

claude bot commented Jan 15, 2026

Uh oh!

augmentcode bot commented Jan 15, 2026

Uh oh!

augmentcode bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

19830: Don't allocate to compare string slices #185

Are you sure you want to change the base?

19830: Don't allocate to compare string slices #185

Conversation

martin-augment commented Jan 15, 2026

Uh oh!

coderabbitai bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

claude bot commented Jan 15, 2026

Code Review: PR #185 - Don't allocate to compare string slices

Summary

✅ Strengths

🔍 Code Quality Observations

🧪 Test Coverage

⚡ Performance Considerations

🔒 Security Considerations

🐛 Potential Issues

📝 Minor Observations

✨ Recommendations

Final Verdict: ✅ LGTM (Looks Good To Me)

Uh oh!

augmentcode bot commented Jan 15, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai bot commented Jan 15, 2026 •

edited

Loading